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A  STUDY  OF  ERRORS  CAUSED  BY  TRANSCRIPTION  MISTAKES  IN  FORTRAN 
PROGRAMS+ 

LLOYD  D.  FOSDICK^ 


Introduction 

Transcription  mistakes  are  a  common  kind  of  mistake  made  in  une 
construction  of  programs.  Often  they  occur  when  a  program  is  tran¬ 
scribed  from  a  handwritten  form  into  a  machine  readable  form,  but  they 
also  occur  when  a  program  is  transcribed  from  the  mind  of  the  author 
onto  paper,  or  from  a  flow  diagram  into  a  sequence  of  statements,  and 
indeed  whenever  transcription  is  performed.  It  is  clear  that  these  mis¬ 
takes  are  inevitable:  no  matter  how  much  care  is  taken  in  the  prepara¬ 
tion  of  a  program,  no  matter  how  rigorously  good  principles  of  design 
are  followed,  and  no  matter  how  much  effort  is  invested  in  proving  a 
program,  the  chance  of  program  errors  caused  by  transcription  mistakes 
cannot  be  reduced  to  zero  because  human  systems  are  not  perfect.  In¬ 
deed  there  is  a  kind  of  uncertainty  principle  operating  in  this  do¬ 
main  because  the  act  of  verifying  a  program  itself  involves  transcrip¬ 
tion  and  is  therefore  vulnerable  to  these  mistakes. 

Depending  on  the  care  taken,  some  transcription  mistakes  will  be 
discovered  by  proof-reading  and  those  which  remain  will  be  discovered, 
if  at  all,  by  the  phenomena  they  cause.  When  a  transcription  mistake 
causes  a  syntax  error  it  will  be  discovered  easily;  or,  when  a  tran¬ 
scription  mistake  causes  an  unusual  construction  to  appear,  as  when 
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MCS  77-02194 
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the  FORTRAN  statement 

X  -  X  +  1.0 
is  erroneously  transcribed  as 
Y  *  X  +  1.0 

and  Y  is  not  a  program  variable,  it  too  can  be  found  easily.  But  when 
n  is  written  to  fifteen  significant  figures  as 
3.1415  93653  58979, 

the  fact  that  the  seventh  digit  should  be  2  instead  of  3  will  not  be 
discovered  with  comparable  ease.  Thus  we  arrive  at  the  question  which 
occupies  us  here:  What  is  the  nature  of  the  errors  caused  ’  tran¬ 
scription  mistakes  and  what  portion  of  them  can  be  detected  easily, 
that  is  at  a  cost  comparable  to  the  cost  of  compilation. 

The  simplest  and  perhaps  the  most  common  kind  of  transcription 
mistake,  is  made  with  individual  characters.  The  substitution  of  one 
character  for  another  is  an  example.  Another  kind  is  the  confusion 
of  identifiers,  where  one  is  substituted  for  another.  In  a  sense  this 
kind  of  mistake  is  more  complex  since  it  takes  place  at  the  word 
level  rather  than  the  character  level  and,  probably  more  importantly, 
it  involves  memory.  Still  another  kind  is  the  omission  of  expressions, 
statements,  or  even  sequences  of  statements.  Such  mistakes  are  easily 
caused  by  a  lapse  in  attention  and  it  is  not  uncommon  that  the  omitted 
text  is  preceded  by  a  segment  similar  or  identical  to  the  end  of  the 
omitted  text.  From  the  point  of  view  of  the  mental  processes  involv¬ 
ed  this  kind  of  mistake  seems  almost  as  simple  as  the  single  character 
mistake.  However,  our  interests  here  do  not  require  that  we  know  why 
a  mistake  was  made  or  whether  one  is  more  complex  than  another.  Our 
interest  is  in  the  effects  of  a  mistake. 


In  this  paper  the  focus  is  on  single  character  mistakes  in 
FORTRAN  programs.  The  effects  of  several  kinds  of  single  character 
mistakes  on  programs  with  different  characteristics  are  considered  and 
we  look  briefly  at  the  ability  of  some  widely  used  compilers  to  detect 
the  errors  caused  by  these  mistakes.  A  Monte  Carlo  scheme  is  usea  to 
generate  an  ensemble  of  programs  containing  errors  from  simulated  tran¬ 
scription  mistakes.  These  errors  are  then  analyzed  and  classified 
according  to  the  ease  with  which  they  may  be  detected.  The  difficulty 
of  the  problem  addressed  here  almost  precludes  deriving  useful  results 
by  formal  analysis.  However,  in  the  next  section  a  simple  analysis  of 
this  problem  to  predict  the  frequency  of  syntax  errors  is  described  and, 
as  we  shall  see,  it  yields  results  which  are  in  good  agreement  with 
those  obtained  from  Monte  Carlo  sampling. 

The  idea  of  inserting  simulated  mistakes  in  programs  has  been 
discussed  by  others.  Weinberg  and  Gresset  [1]  used  it  to  study  the 
error  detecting  capability  of  a  FORTRAN  compiler.  It  has  been  ad¬ 
vocated  by  Gilb  [2]  as  a  technique  for  measuring  the  number  of  undetect¬ 
ed  errors  in  a  program  -  adopting  the  ideas  used  by  biologists  for 
measuring  fish  populations,  etc.  Recently  Lipton  and 
Sayward  [3]  have  suggested  it  as  a  mechanism  for  guiding  the  selection 
of  test  data.  The  work  reported  here,  while  bearing  some  relation  to 
this  other  work,  is  different  in  its  objectives  from  the  work  of  Gilb 
and  that  of  Lipton  et.al.,  and  is  wider  in  scope  than  the  work  of 
Weinberg  and  Gresset. 

Prediction  of  Syntax  Errors  by  Ana  lysis. 

Since  short  assignment  statements,  appear  to  be  the  most  common 
kind  of  statement  appearing  in  FORTRAN  programs  [4]  we  direct  our 
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attention  at  them.  Let  a  stand  for  any  letter  of  the  alphabet,  anG 
u>  for  any  letter  or  any  of  the  of  the  ten  decimal  digits.  All  legal 
three-character  assignment  statements  will  have  the  form 

a  =  w. 

Now  count  the  number  of  ways  in  which  exactly  one  of  these  character 
can  be  replaced  by  another  FORTRAN  character  in  such  a  way  that  a 
syntactically  correct  statement  results.  For  all  such 
statements  but  one  this  number  is  60:  for  the  one  exception,  namely  the 
statement  E  =  D,  this  number  is  61  because  of  the  possibility 
E  =  D  =>END.  There  are  altogether  47  characters  in  the  FORTRAN  char¬ 
acter  set  [6]  hence,  ignoring  the  exception,  the  probability  that  sub¬ 
stitution  of  exactly  one  of  the  three  characters  by  another  will  yield 
a  syntactical ly  correct  statement  is  60/138  =  0.43. 

We  can  extend  this  straightforward  analysis  to  longer  statements 
but  the  number  of  cases  that  need  to  be  considered  grows  rapidly  and 
the  computation  becomes  very  tedious.  A  brief  look  at  four-character 
assignment  statements  is  sufficient  to  illustrate  this.  Let  0  stand 
for  +  or  -,  6  for  any  decimal  digit,  and  a  and  u>  as  before.  There 
are  nine  forms  to  be  considered:  a  =  aa,  a  =  a6,  a  =  66,  a  =  oa,  a  =  06, 
a  *  .6,  a  =  5. ,  aa  =  u),  a5  =  w.  With  each  form  we  assign  a  weight,  w, 
which  is  the  number  of  instances  of  that  form;  for  example, 
w(a  =  aa)  =  26  ,  w(a  =  a5)  =  26  *  10,  and  so  forth.  We  distinguish 
between  the  forms  a  =  aa  and  a  =  a6  and  do  not  lump  them  together  as 
a  =  aw  because  the  third  character  can  be  changed  to  a  digit  or  a 
decimal  point  in  the  second  form  yielding  a  syntactically  correct 
statement  (viz.  A  =  A9  =£  A  =  99,  or  A  =  A9  =  A  =>  .9)  but  this  is  not 
true  for  the  form  a  =  aa.  Similar  considerations  force  distinction 
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of  the  nine  forms  listed  above.  For  each  of  these  cases  we  compute  the 
probability,  p,  that  a  single  character  substitution  will  yield  a  syn¬ 
tactically  correct  statement  just  as  for  the  three-character  case; 
for  example,  p(a  =  act)  =  0.48,  p(a  =  nS)  =  0.54.  Finally,  we  compute 
the  average  probability  p  that  a  single  character  substitution  wi 
yield  a  syntactically  correct  statement;  this  is  given  by  the  usual 

formula  r 

_  •  w(i)  p(i) 

^  w(i) 

where  the  sums  extend  over  the  nine  cases.  The  result  is  p  =  0.51. 

When  this  analysis  is  extended  to  five-character  assignment  statements 
57  forms  are  distinguished  and  similarly  analyzed:  for  this  class  of 
statements  the  average  probability  that  a  single  character  substitution 
will  yield  a  syntactically  correct  statement  is  p  =  0.56.  This  analysis 
has  not  been  extended  to  longer  assignment  statements  because  the  number 
of  forms  which  need  to  be  distinguished  makes  the  problem  almost  intract¬ 
able. 

On  the  basis  of  this  approach  we  can  estimate  that  a  single  char¬ 
acter  substitution  in  a  FORTRAN  program  has  a  slightly  better  than  50T 
chance  of  yielding  a  program  that  is  syntactically  correct.  This  esti¬ 
mate  is  crude  for  a  number  of  reasons  which  are  evident  from  the  approach 
we  have  taken.  However,  we  shall  see  that  it  agrees  rather  well  with 
the  random  sampling  or  Monte  Carlo  approach  described  below. 

Monte  Carlo  Experiments  to  Simulate  Transcription  Mistakes. 

Four  common  transcription  mistakes  made  in  typing  are  simulated  in 
these  experiments;  substitution  -  the  substitution  of  one  character 
for  another;  deletion  -  the  omission  of  a  character;  insertion  -  the 
insertion  of  a  character;  transposition  -  the  interchange  of  adjacent 
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characters.  All  of  these,  except  transposition,  are  single  character 
mistakes.  Two  of  them,  substitution  and  insertion,  require  the  intro¬ 
duction  of  a  new  cnaracter  into  the  text  and  so  the  question  of  how 
this  new  character  is  to  be  selected  arises.  In  simulating  substitution 
mistakes  we  randomly  selected  a  character  from  among  the  correct  char¬ 
acter's  nearest-neighbors  on  the  keyboard  of  an  IBM  026  keypunch.  This 
rule  was  used  to  govern  the  selection  because  evidence  from  experiments 
with  typists  shows  that  a  nearest-neighbor  is  the  most  likely  character 
to  be  erroneously  substituted  [6].  However,  in  order  to  explore  the 
effect  of  another  selection  rule,  a  series  of  experiments  were  made 
in  which  every  character  in  the  FORTRAN  character  set  was  made  an 
equally  likely  candidate  for  substitution.  As  will  be  seen,  use  of 
this  alternate  rule  had  a  noticeable  effect  on  the  results.  Another 
obvious  choice,  but  not  one  considered  here,  is  the  character  on  the 
same  key  but  in  the  alternate  shift  mode  -  simulating  failure  to  shift 
from  alphabetic  to  numeric  or  vice  versa.  For  insertion  mistakes  the 
alternate  selection  rule,  all  characters  equally  likely,  was  the  only 
rule  used. 

The  character  position  in  the  program  text  where  the  mistake  is 
simulated  was  selected  at  random,  giving  each  position  equal  probability 
of  selection,  ignoring  COMMENT  statements  and  blank  positions.  When 
the  position  was  selected  one  instance  of  each  kind  of  mistake  was 
simulated.  This  selection  process  was  repeated  fifty  times,  so  for 
each  program  text  fifty  samples  of  it  were  created  with  one  instance 
of  a  particular  kind  of  mistake  -  thus  two  hundred  and  fifty  samples 
of  a  particular  text  altogether:  fifty  of  substitution  with  nearest- 
neighbor  character  substituted,  fifty  of  substitution  with  any 
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character  substituted,  fifty  of  deletion,  fifty  of  insertion  with  any 
character  substituted,  and  fifty  of  transposition.  There  is  a  correla¬ 
tion  among  samples  arising  from  the  fact  that,  for  a  given  position 
selection,  each  of  the  five  kinds  of  mistake  appeared  at  the  same  place. 
This  correlation  permits  a  better  comparison  of  the  effects  of  tne 
different  kinds  of  mistake. 

The  particular  mistakes  chosen  for  consideration  here  are  no  doubt 
familiar  to  the  reader  who  may  draw  on  personal  experience  to  decide 
their  relative  likelihood.  However,  it  is  worth  noting  that  substitu¬ 
tion  and  deletion  errors  together  appear  to  be  far  more  common  than 
insertion  and  transposition  errors.  In  a  study  [7]  of  mistakes  made  in 
keying  cash  amounts  in  a  bank  central  office  the  following  frequencies 
were  observed:  substitution,  62.4%;  deletion,  20.7%;  insertion,  6.0%; 
transposition,  1.5%;  other,  9.4%.  With  specific  reference  to  these 
mistakes  in  FORTRAN  text,  James  i.nd  Partridge  [8]  made  the  following 
observations:  suDStitution,  24%;  deletion,  58%;  insertion,  18%;  trans¬ 
position,  0%.  These  observations  are  consistent  with  the  observation 
that  substitution  and  deletion  are  simpler  actions  than  insertion  and 
transposition. 

Four  program  texts,  taken  from  ACM  Transactions  on  Mathematical 
Software,  were  used  as  subjects: 

1.  Algorithm  495  -  Solution  of  an  Overdetermined  System  of 
Linear  Equations  in  the  Chebyshev  Norm  [9]; 

2.  Algorithm  498  -  Airy  Functions  Using  Chebyshev  Series 
Approximations  [10]; 


3.  Algorithm  505  -  A  List  Insertion  Sort  for  Keys  with  Arbitrary 
Key  Distribution  [11]; 
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4.  Algorithm  513  -  Analysis  of  In-Situ  Transposition  [12]. 

There  are  significant  differences  between  them.  In  Algorithm  495  a 
two-dimensional  array  and  two  one  dimensional  arrays  are  prominent  and 
there  are  no  constants  except  a  few  small  integers.  In  Algorithm  496 
there  is  no  two-dimensional  array  but  there  are  some  small  one-din. an- 
sional  arrays  used  as  tables  for  real  constants:  the  large  number  of 
real  constants,  nearly  200,  it  contains  is  a  distinguishing  character¬ 
istic  of  this  algorithm,  and  it  is  the  only  one  to  contain  WRITE  and 
FORMAT  statements.  Algorithm  505  has  variables  and  constants  of  type 
integer  only  and  it  has  a  one  dimensional  array  of  57  integer  constants 
initialized  in  a  DATA  statement.  Algorithm  513  has  a  one-dimensional 
array  of  type  real,  all  other  variables  are  of  type  integer  and  it 
has  just  a  few  constants  all  of  type  integer.  There  are  some  differences 
in  size:  the  number  of  lines,  excluding  COMMENT  lines,  in  these  al¬ 
gorithms  is  208,  249,  71,  and  81,  respectively,  the  number  of  statements, 
excluding  COMMENT  statements,  is  207,  168,  58,  and  81,  respectively;  the 
number  of  characters,  excluding  blanks  and  COMMENT  statements,  is  2917, 
6820,  1364,  and  979,  respectively. 

After  the  samples  were  generated  each  was  examined  by  eye  to  de¬ 
termine  the  kind  of  error  caused  by  the  simulated  mistake.  Four  kinds 
of  error  were  distinguished. 

1.  Syntax  error:  A  violation  of  the  language  rules  determinable 
by  scanning  the  altered  statement  out  of  context. 

2.  Semantic  error:  A  violation  of  the  language  rules  determin¬ 
able  at  compile  or  load  time  and  not  included  in  1. 

3.  Anomalous  use  of  a  variable:  Exactly  one  appearance  of  a 
variable  name  in  a  program  unit,  or  use  of  a  local  variable 
only  in  a  referencing  context,  or  use  of  a  local  variable  only 
in  a  defining  context. 


4.  Other:  Anything  not  covered  by  1,  2,  or  3. 

The  language  rules  referred  to  here  are  those  for  ANS  FORTRAN  66  [6]. 

For  a  language  like  ALGOL  or  PASCAL  the  errors  in  the  first  category 
could  be  defined  with  respect  to  the  formal  grammar  used  to  define  the:.!, 

but  since  FORTRAN  66  is  defined  only  informally  we  are  forced  to  . . 

informal  definition  of  syntax  and  semantic  errors  here.  However,  this 
should  not  cause  any  serious  misunderstanding.  The  nature  of  the  mis¬ 
takes  we  are  considering  is  such  that  they  are  likely  to  cause  an 
anomolous  use  of  a  variable  to  appear  and  most  of  these  are  recognized 
in  category  3,  however,  they  are  included  in  this  category  only  if 
they  are  determinable  without  path  tracing  -  i.e.,  without  recognizing 
the  order  in  which  statements  are  executed.  It  will  be  noted  that  no 
path  tracing  is  required  to  recognize  the  fact  that  a  variable  name 
appears  only  once  in  a  program  unit, and  provided  it  is  not  used  in  a 
procedure  call  it  is  possible  to  determine  whether  a  local  variable  is 
used  only  in  a  referencing  context,  or  only  in  a  defining  context.  These 
terms,  reference  and  define,  refer  to  fetching  a  value  from  memory  and 
assigning  >  value,  respectively:  x  is  in  a  referencing  context  in 
.y  =  x+1  and  y  is  in  a  defining  context.  Any  anomalies  which  would 
require  path  tracing  to  detect  them  fall  in  category  4.  A  FORTRAN  expert 
will  recognize  that  category  3  includes  certain  errors  that  might  have 
been  placed  in  category  2  because  it  is  a  violation  of  the  language  to 
use  a  variable  in  a  referencing  context  before  it  has  appeared  in  a  de¬ 
fining  context  and  a  variable  used  only  in  a  referencing  context  is 
surely  such  a  violation.  Nevertheless  it  seemed  more  sensible  to  in¬ 
clude  these  in  category  3.  One  point  about  this  classification  needs 
to  be  emphasized.  The  anomalies  or  errors  included  in  the  first  three 
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categories  are  easy  to  detect  and  we  should  expect  that  a  good  cor  n',  r 
will  detect  all  of  them.  Those  in  category  4  on  the  other  hanc  are  sub¬ 
stantially  more  difficult  to  detect  and  while  some  might  be  detectable 
by  techniques  of  static  analysis,  others  would  not.  In  any  case  we 
would  expect  to  use  data  flow  analysis,  testing,  or  some  other  <  .  ■■  ue 

to  recognize  them. 

The  results  of  this  classification  of  the  1000  samples  are  dis¬ 
played  in  Fig.  1,  where  the  abbreviations  used  are  defined  as  follows: 

SN,  substitution  mistake  -  nearest-neighbor  substituted;  SR,  substitution 
mistake  -  any  character  substituted;  DL,  deletion  mistake;  IN,  insertion 
mistake  -  any  character  inserted;  TR,  transposi tion  mistake.  The  large 
number  of  real  constants  in  Algorithm  498  explains  why  the  SN,  DE,  and 
TR  mistakes  yield  a  high  proportion  of  samples  in  category  4:  real 
constants  tend  to  be  converted  irto  real  constants.  The  SN  mistakes 
cause  a  lower  percentage  of  syntax  errors  than  SR  mistakes  because  SN 
mistakes  are  more  likely  to  substitute  another  character  of  the  same 
type.  The  actual  probabilities  are:  pr{<  letter  >  ^  <  letter  >)  = 

85%  (56  %),  pr  {  <  digit  >  ^  <  digit  >  }  =  74%  (20%) ,  pr{  <  sp.  character  >  => 

<  sp.  character >  1  =  57%  (18%)  where  the  number  inside  parentheses  is 
the  value  for  an  arbitrary  character  substitution.  When  the  results 
in  Fig.  1  for  all  four  algorithms  are  combined  the  distribution  of  errors 
over  the  four  categories  is:  syntax  error,  52";  semantic  error,  18%; 
anomaly,  16%:  other  14%.  It  is  interesting  to  note  that  the  result  ob¬ 
tained  here  for  the  frequency  of  syntax  errors  agrees  well  with  the 
result  we  obtained  earlier  by  analysis. 

Out  of  the  one  thousand  samples,  one  hundred  and  forty  fell  in 
the  fourth  category  representing  errors  relatively  difficult  to  de¬ 
tect,  and  of  these  fifteen  were  one  of  the  following  types: 


referencing  an  undefined  variable,  two  definitions  of  a  variable  with¬ 
out  an  intervening  reference,  a  null  statement  (e.g.  x  =  x).  It  is  reason¬ 
able  to  assume  that  these  fifteen  could  be  detected  by  data  flow  ana'ysis 
or  simple  matching  (for  the  null  statements).  Thus  it  appears  that 
more  than  10*  of  the  errors  caused  by  mistakes  would  remain  until  <- 
ecution  time  for  their  detection,  making  generous  assumptions  about  de¬ 
tection  by  static  analysis. 

It  is  natural  in  considering  these  results  to  wonder  about  the 
effectiveness  of  compilers  in  detecting  these  errors.  Accordingly  the 
samples  produced  by  the  SN  mistakes  were  submitted  for  compilation  to 
four  different  compilers:  MNF,  the  University  of  Minnesota  FORTRAN  com¬ 
piler;  FTN,  the  CDC  FORTRAN  compiler;  FORTH,  the  IBM  FORTRAN  H-level 
compiler;  and  WATFIV,  the  University  of  Waterloo  compiler.  In  Fig.  2 
the  errors  detected  by  these  compilers  are  illustrated,  with  the  number 
of  errors  in  the  first  three  categories  shown  for  reference  (marked  EOE). 
It  is  evident  that  most  of  these  compilers  do  little  more  than  catch 
the  syntax  errors  and  some  of  the  semantic  errors.  No  results  were  ob¬ 
tained  for  WATFIV  on  algorithm  498  because  of  difficulties  caused  by 
the  long  DATA  statements  it  contained. 

Conclusion 

These  results  have  three  applications.  They  contribute  towards 
providing  quantitative  measures  of  the  reliability  of  programs,  they 
provide  a  base  for  the  comparison  of  similar  phenomena  in  other  lan¬ 
guages,  and  they  provide  a  target  at  which  the  builders  of  FORTRAN 
compilers  can  aim. 

Our  ability  to  provide  some  quantitative  measure  for  the 
reliability  of  a  program  is  notably  weak.  In  practice  ad  hoc  techniques 
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based  upon  what  appears  to  be  reasonable  are  all  that  we  have  for  judg¬ 
ing  a  program  to  be  reliable  or,  to  put  it  another  way,  for  estimating 
the  number  of  errors  it  might  contain.  The  results  presented  here  pro¬ 
vide  us  with  some  assistance  with  this  problem.  It  is  reasonable  to 
assume  that  the  density  of  transcription  mistakes  which  remain  in  a 
program  after  proof-reading  is  sufficiently  low  to  treat  them  as  in¬ 
dependent,  non-interfering  phenonema:  if  there  is  any  doubt  as  to  the 
validity  of  this  assumption  one  would  have  little  difficulty  in  test¬ 
ing  it.  Therefore  by  simply  multiplying  the  frequency  of  mistakes  in 
the  text  after  proof-reading  by  the  numbers  obtained  here  we  have  an 
estimate  of  the  number  of  mistakes  which  escaped  detection  after  com¬ 
pilation  and  static  analysis.  The  first  factor  can  be  measured  in¬ 
dependently  and  will  certainly  depend  on  the  quality  of  the  typists 
and  proof-readers,  but  to  show  what  might  be  expected  from  such  a  cal¬ 
culation  we  use  some  data  that  are  available.  James  and  Partridge  [8] 
in  a  study  of  two  hundred  FORTRAN  programs,  composed  of  20,121  state¬ 
ments  altogether,  found  approximately  three  mistakes  per  thousand 
statements  of  which  90£  were  of  the  single  character  type:  since  we 
can  assume  James  and  Partridge  did  not  find  them  all  and  since  it  is 
unclear  precisely  when,  after  proof-reading,  they  made  their  observa¬ 
tions,  we  conclude  that  the  number  of  mistakes  in  the  programs  after 
proof-reading  was  at  least  three  per  thousand  statements.  Other  data 
on  keying  errors  support  this.  In  a  study  of  mistakes  made  in  keying 
statistical  data  by  Deming,  Tepping,  and  Geoffrey  [13]  they  found  that 
the  "maximum  error  rate  is  one  wrong  card  in  one  hundred  cards  punched." 
In  a  study  concerned  with  the  design  of  keyboards  Klemmer  [14]  observed 
that  "Experienced  operators  average  56,000  to  83,000  keystrokes  per 


day  with  1,600  to  4,300  strokes  per  residual  error"  (a  "residual  error" 
is  one  remaining  after  detection  and  correction  of  the  text  by  the  typist) 
this  translates  to  0,25  to  0.6  residual  mistakes  per  1,000  characters. 
Klemmer's  data  is  consistent  with  that  obtained  from  the  Oxford  University 
Press  for  operators  of  typesetting  keyboards:  superior  operators  nave  an 
average  error  rate  of  about  0.5  residual  mistakes  per  1,000  characters. 

If  we  assume  an  average  of  about  14  (non-blank)  characters  per  statement, 
as  is  the  case  for  Algorithm  495,  then  we  might  expect  between  3.5  and 
8.4  mistakes  per  one  thousand  statements.  It  is  a  matter  of  conjecture 
as  to  how  many  of  these  might  be  caught  in  proof-reading.  If  we  assume 
the  worst,  that  is  none  caught  in  proof-reading,  and  we  take  the  results 
obtained  from  the  work  reported  here  which  show  that  about  85%  of  the 
errors  caused  by  typing  mi  stakes  could  be  caught  during  compilation  and  static 
analysis,  we  obtain  the  result  that  after  compilation  and  static  analysis 
we  could  expect  between  0.5  and  1.3  mistakes  per  one  thousand  statements. 
This  result  would  be  reduced  in  proportion  with  the  number  of  mistakes 
caught  by  proof-reading;  but  on  the  other  hand  we  have  seen  that  exist¬ 
ing  compilers  have  a  much  poorer  error  detection  rate  than  85%  tending 
to  increase  this  result  in  actual  practice.  The  density  of  mistakes 
remaining  in  a  program  when  it  is  put  into  use,  that  is  to  say  after 
testing,  can  then  be  estimated  once  we  have  a  quantitative  measure  of 
test  effectiveness. 

There  is  an  intuitive  feeling  people  have  to  the  effect  that  mis¬ 
takes  in  programs  written  in  Algol -like  languages  are  less  likely  than 
in  programs  written  in  FORTRAN.  Now  so  far  as  the  kind  of  mistakes  that 
we  are  treating  here  is  concerned  this  difference,  if  it  exists,  will 


be  primarily  due  to  the  fact  that  an  Algol -like  language  will  make  the 
mistake  easier  to  detect:  it  is  not  so  likely  that  the  language  differ¬ 
ence  would  reduce  the  frequency  of  typing  mistakes  -  indeed  the  larger 
alphabet  of  Algol -like  languages  could  serve  to  increase  the  frequency 
of  typing  mistakes.  An  investigation  carried  out  on  programs  writ  'n 
in  other  languages  like  the  one  carried  out  here  on  programs  written 
in  FORTRAN  could  resolve  this  issue  and  might  provide  some  clues  to 
language  features  which  enhance,  or  inhibit,  error  detection. 

Finally,  we  have  seen  from  the  results  presented  here  that  there 
appears  to  be  considerable  room  for  improvement  in  existing  compilers. 
The  main  area  needing  improvement  is  anomaly  detection,  though  it  must 
be  admitted  that  this  is  a  difficult  area  to  deal  with  because  increas¬ 
ing  the  reporting  of  anomalies  tends  to  increase  the  false  alarm  rate. 
Investigating  the  error  detecting  capability  of  existing  FORTRAN  com¬ 
pilers  has  not  been  an  important  theme  of  this  work,  however,  the  few 
results  we  have  obtained  in  this  direction  suggest  that  further  work 
in  this  area  could  serve  as  a  stimulus  to  compiler  writers  and  as  a 
warning  to  the  careless  programmer  who  likes  to  leave  it  to  the  com¬ 
piler  to  find  the  mistakes. 

Part  of  this  work  was  done  while  I  was  a  visitor  with  the 
Numerical  Algorithms  Group  in  Oxford.  I  thank  them  for  their  hospi¬ 
tality  and  I  also  thank  C.  W.  Gear  who  kindly  ran  my  samples  on  a  WATFIV 
compiler,  and  J.  M.  Boyle  who  did  the  same  on  an  IBM  FORTRAN  compiler. 
Finally,  I  thank  Dan  Ruegg,  Mario  Escobar,  and  Carol  Drey  of  the 
University  of  Colorado  who  assisted  in  gathering  the  data  reported  here. 
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Figure  captions. 

Figure  1:  The  effect  of  five  classes  of  typing  mistakes  (SN,  sub¬ 
stitution  -  nearest  neighbor;  SR  substitution  -  any  character;  DE, 
deletion;  IN,  insertion;  TR,  transposition)  on  four  algorithms.  For 
each  case  the  first  interval  on  the  bar  graph  denotes  "sj..tax" 
errors,  the  second  interval  denotes  "semantic"  errors,  the  third 
interval  denotes  "anomalous  use  of  a  variable,"  and  the  fourth  in¬ 
terval  (shaded)  denotes  "other"  errors.  The  fourth  interval  is 
shaded  to  clearly  distinguish  the  errors  in  this  class  which  are 
difficult  to  detect  from  those  in  the  other  three  classes  which  are 
relatively  easy  to  detect. 

Figure  2:  The  effectiveness  of  four  FORTRAN  compilers  in  detecting 
errors  caused  by  substitution  (nearest  neighbor)  typing  blunders. 

The  percent  of  errors  detected  is  shown.  Also  shown  (EDE)  is  the 
percent  of  errors  which  are  easy  to  detect,  namely  those  in  the  three 
classes:  1,  syntax;  2,  semantic;  3,  anomalous  use  of  a  variable. 
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